首页> 外文OA文献 >From Segmentation to Analyses: A Probabilistic Model for Unsupervised Morphology Induction
【2h】

From Segmentation to Analyses: A Probabilistic Model for Unsupervised Morphology Induction

机译:从分割到分析:无监督形态学归纳的概率模型

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

A major motivation for unsupervised morphological analysis is to reduce the sparse data problem in under-resourced languages. Most previous work focuses on segmenting surface forms into their constituent morphs (e.g., taking: tak +ing), but surface form segmentation does not solve the sparse data problem as the analyses of take and taking are not connected to each other. We extend the MorphoChains system (Narasimhan et al., 2015) to provide morphological analyses that can abstract over spelling differences in functionally similar morphs. These analyses are not required to use all the orthographic material of a word (stopping: stop +ing), nor are they limited to only that material (acidified: acid +ify +ed). On average across six typologically varied languages our system has a similar or better F-score on EMMA (a measure of underlying morpheme accuracy) than three strong baselines; moreover, the total number of distinct morphemes identified by our system is on average 12.8% lower than for Morfessor (Virpioja et al., 2013), a stateof-the-art surface segmentation system.
机译:无监督形态学分析的主要动机是减少资源不足语言中的稀疏数据问题。以前的大多数工作都集中在将表面形式分割成其组成的形态上(例如,tak + ing),但是表面形式分割不能解决稀疏数据问题,因为对take和take的分析没有相互联系。我们扩展了MorphoChains系统(Narasimhan等人,2015),以提供形态分析,可以抽象出功能相似的形态中的拼写差异。这些分析不需要使用单词的所有正字法材料(停止:stop + ing),也不限于仅使用该材料(酸化:酸+ ify + ed)。平均而言,我们的系统在六种不同类型的语言上的EMMA(衡量基本语素准确性的指标)的F评分高于或高于三种强基准。此外,我们的系统识别出的独特语素的总数平均比最先进的表面分割系统Morfessor少12.8%(Virpioja等人,2013)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号